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300 (57) Abstract: A method of analyzing patterns. 
The method comprises: receiving a first diffraction 
pattern; receiving a second diffraction pattern; 
receiving a third diffraction pattern; determining a 
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patterns; determining a similarity between the 
first and third difi'raction pattern; determining a 
similarity between the second and third diffraction 
pattern; and performing hierarchical cluster 
analysis on the first and second diffraction pattern 
based on the determined similarity. Method of 
matching X-ray diffraction patterns using the 
fundamental parameter (FP) method. Patterns 
may be matched by identifying peaks within 
the patterns and matching the patterns based on 
the identified peaks or by matching the intensity 
envelopes of the patterns. The similarities between 
the patterns are expressed by scores which may 
be used to perform hierarchical cluster analysis 
(HCA) on the patterns to yield a dendrogram. 
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METHOD OF COMPARING X-RAY DIFFRACTION PATTERNS USING THE FUNDAMENTAL 
PARAMETER METHOD 

Description 

Related Applications 

This application claims priority from U.S. Provisional Patent Application 
No. 60/401,811, filed August 6, 2002, which is incorporated herein by reference. 

Technical Field 

The invention relates to the field of pattern matching, and more specifically, 
a system for and method of matching diffraction patterns utilizing hierarchical cluster 
analysis. 

Background 

Diffraction is frequently used as an analytical technique to characterize 
compounds or elements. There are situations where a number of materials are analyzed by 
diffraction techniques and compared to one another in order to determine whether 
differences in the materials exist. For example, production lots of a compound might be 
analyzed by diffraction to ensure that the desired material is produced. As another example^ 
a compound might be crystallized under a variety of conditions and the resulting solids 
analyzed by diffraction to determine if variations in solid form are present. As a third 
example, an ionizable compound might be reacted with a number of different counterions in 
an effort to generate a group of different salts. Li this case, the solids from the reactions 
could be analyzed by diffraction and compared to diffraction analyses of the original 
material and the counterion to help determine whether a salt was formed. It would be useful 
to have a tool to quickly, easily^ and accurately compare diffraction patterns of different 
materials and sort them into groups of similar patterns. 

Hierarchical Cluster Analysis is a statistical method of pattern recognition 
with wide applicability. Whenever the application is to cluster relatively similar objects 
together into different groups, then HCA is a common method of choice. The core 
requirement of HCA is the derivation of a measure of similarity between the objects being 
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clustered. The success of the HCA approach is dependent on the robustness of the measure 
of similarity chosen. The early implementations of HCA were statistical data analysis where 
the measure of similarity was the numerical equivalency of the results being analyzed. This 
has seen wide application for statistical quantitative analysis. 

The use of HCA for clustering objects more complex than quantitative values 
has been limited by the availability of a suitable measure of similarity between the objects 
to be clustered. The appropriate choice of a measure of similarity is not obvious. 

The present invention is directed to overcoming one or more of the above 
problems and achieving one or more of the above stated goals. 

Summary 

Consistent with the present invention, a method of analyzing patterns is 
provided. The method comprises: receiving a first diffraction pattern; receiving a second 
diffraction pattern; receiving a third diffraction pattern; determining a first similarity 
between the first and the second diffraction patterns; determining a second similarity 
between the first and the third diffraction patterns; determining a third similarity between 
the second and the third diffraction patterns; and performing hierarchical cluster analysis on 
the first, the second, and the third diffraction pattern based on the determined first 
similarity, the second similarity, and the third similarity. 

Further consistent with the present invention, a system for analyzing patterns 
is provided. The system comprises: a memory; and a processor coupled to the memory. 
The processor is for: receiving a furst diffraction pattern; receiving a second diffraction 
pattern; receiving a third diffraction pattern; determining a first similarity between the first 
and the second diffraction patterns; determining a second similarity between the fibrst and 
the third diffraction patterns; determining a third similarity between the second and the third 
diffraction pattems; and performing hierarchical cluster analysis on the first, the second, and 
the third diffraction partem based on the determined first similarity, the second similarity, 
and the third similarity. 

Further consistent with the present invention, a machine-readable magnetic 
medium comprising instmctions stored on the medium is provided. The instmction when 
executed perform the stages of: receiving a first diffraction pattern; receiving a second 
diffraction pattern; receiving a third diffraction pattern; determining a first similarity 
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between the first and the second diffraction patterns; determining a second similarity 
between the first and the third diffiraction patterns; determining a third similarity between 
the second and the third diffiraction patterns; and performing hierarchical cluster analysis on 
the fia-st, the second, and the third diffraction pattern based on the determined first 
similarity, the second similarity, and the third similarity. 

Consistent with the present invention, a method of analyzing patterns is 
provided. The method comprises: receiving a first diffiraction pattem; receiving a second 
diffiraction pattem; receiving a third diffraction pattem; detemiining a first similarity 
between the first and the second diffraction patterns based on the characteristic peaks of the 
first and the second diffraction pattems; determining a second similarity between the first 
and the third diffiraction pattems based on the characteristic peaks of the first and the third 
diffraction pattems; determining a third similarity between the second and the third 
diffraction pattems based on the characteristic peaks of the second and the third diffraction 
pattems; and performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattem based on the determined first, the second, and the third similarity. 

Further consistent with the present invention, a system for analyzing pattems 
is provided. The system comprises: a memory; and a processor coupled to the memory. 
The processor is for: receiving a first diffraction pattem; receiving a second diffiraction 
pattem; receiving a third diffraction pattem; determining a first similarity between the first 
and the second diffraction pattems based on the characteristic peaks of the first and the 
second diffiraction pattems; determining a second similarity between the first and the third 
diffraction pattems based on the characteristic peaks of the first and the third diffiraction 
pattems; determining a third similarity between the second and the third diffraction pattems 
based on the characteristic peaks of the second and the third diffraction pattems; and 
performing hierarchical cluster analysis on the first, the second, and the third diffraction 
pattem based on the determined first, the second, and the third similarity. 

Further consistent with the present invention, a machine-readable magnetic 
medium comprising instractions stored on the medium is provided. The instmction when 
executed perform the stages of: receiving a first diffraction pattem; receiving a second 
diffraction pattem; receiving a third diffraction pattem; determining a first similarity 
between the first and the second diffiraction pattems based on the characteristic peaks of the 
first and the second diffiraction pattems; determining a second similarity between the first 
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and the third diffraction patterns based on the characteristic peaks of the first and the third 
diffraction patterns; determining a third similarity between the second and the third 
diffi-action pattems based on the characteristic peaks of the second and the third diffraction 
patterns; and performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattern based on the determined first, the second, and the third similarity. 

Consistent with the present invention, a method of analyzing pattems is 
provided. The method comprises; receiving a first diffraction pattern; receiving a second 
diffraction pattern; receiving a third diffraction pattern; determining a first similarity 
between the first and the second diffraction pattems based on the intensity envelopes of the 
first and the second diffraction pattems; determining a second similarity between the first 
and the third diffraction pattems based on the intensity envelopes of the first and the third 
diffraction pattems; determining a third similarity between the second and the third 
diffraction pattems based on the intensity envelopes of the second and the third diffraction 
pattems; and performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattern based on the determined first, the second, and the third similarity. 

Further consistent with the present invention, a system for analyzing pattems 
is provided. The system comprises: a memory; and a processor coupled to the memory. 
The processor is for: receiving a first diffraction pattem; receiving a second diffraction 
pattern; receiving a third diffraction pattem; determining a first similarity between the first 
and the second diffraction pattems based on the intensity envelopes of the first and the 
second diffraction pattems; determining a second similarity between the first and the third 
diffraction pattems based on the intensity envelopes of the first and the third diffraction 
pattems; determining a third similarity between the second and the third diffraction pattems 
based on the intensity envelopes of the second and the third diffraction pattems; and 
performing hierarchical cluster analysis on the first, the second, and the third diffraction 
pattem based on the determined first, the second, and the third similarity. 

Further consistent with the present invention, a machine-readable magnetic 
medium comprising instmctions stored on the medium is provided. The instruction when 
executed perform the stages of: receiving a first diffraction pattem; receiving a second 
diffraction pattem; receiving a third diffraction pattem; determining a first similarity 
between the first and the second diffraction pattems based on the intensity envelopes of the 
first and the second diffraction pattems; determining a second similarity between the first 
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and the third diffraction patterns based on the intensity envelopes of the first and the third 
diffraction patterns; determining a third similarity between the second and the third 
diffraction patterns based on the intensity envelopes of the second and the third diffraction 
pattems; and performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattern based on the determined first, the second, and the third similarity. 

Further consistent with the present invention, a method of analyzing a pattem 
of a disordered form is provided. The method comprises receiving a diffraction pattem of 
the disordered form; simulating a simulated disordered form based on the peak list of the 
ordered form; and matching the simulated disordered form to the diffraction pattem of the 
disordered form. 

Further consistent with the present invention, a system for analyzing a 
pattem of a disordered form is provided. The system comprises memory coupled to a 
processor, the processor for: receiving a diffi-action pattem of the disordered form; 
simulating a simulated disordered form based on the peak list of the ordered form; and 
matching the simulated disordered form to the diffraction pattem of the disordered form. 

Further consistent with the present invention, a machine-readable magnetic 
medium comprising instructions stored on the medium is provided. The instruction when 
executed perform the stages of: receiving a diffraction pattem of the disordered form; 
simulating a simulated disordered form based on the peak list of the ordered form; and 
matching the simulated disordered form to the diffraction pattem of the disordered form. 

Further consistent with the present invention, a method is described for 
matching pattems. The method comprises: performing pattem matching on three or more 
pattems to determine similarities between the pattems; and performing hierarchical cluster 
analysis on the three or more pattems based on the determined similarities. 

Brief Description of the Drawings 

The accompanying drawings, which are incorporated in and constitute a part 
of this specification, illustrate a system consistent with the invention and, together with the 
description, serve to explain the principles of the invention. 

Figure 1 is an illustration of a system consistent with the present invention in 
its operating environment 
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Figure 2 is a flowchart of the operation of the Analysis System consistent 

with the present invention. 

Figure 3 is a flowchart of the operation of the peak comparison methodology 
of the Analysis System consistent with the present invention. 

Figure 4 is a flowchart of the peak comparison pre-processing method 
consistent with the present invention. 

Figure 5 is a flowchart of the peak detection method consistent with the 

present invention. 

Figure 6 is a flowchart of the characteristic peak determination method 

consistent with the present invention. 

Figure 7 is a flowchart of the probability assignment method consistent with 

the present invention. 

Figure 8 is a flowchart of the peak pattern matching method consistent with 

the present invention. 

Figure 9 is a flowchart of the peak comparison method consistent with the 

present invention. 

Figure 10 is an illustration of a diffraction pattern analyzed in the present 

invention. 

Figure 1 1 is an illustration of the diffraction pattern and the diffraction 
pattern baseline determined by methods consistent with the present invention. 

Figure 12 is an illustration of the baseline corrected diffraction pattern 
determined by methods consistent with the present invention. 

Figure 13 is an illustration of the diffraction pattem analyzed by methods 
consistent with the present invention. 

Figure 14 is an illustration of the smoothed diffraction pattem generated by 
methods consistent with the present invention. 

Figure 15 is an illustration of the smoothed, baseline corrected diffiraction 
pattem generated by methods consistent with the present invention. 

Figure 16 is an illustration of the smoothed, baseUne corrected diffraction 
pattem with the peaks detected and categorized generated by methods consistent with the 
present invention. 
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Figure 17 is an illustration of the diffraction pattern with a broad feature 
analyzed by methods consistent with the present invention. 

Figure 18 is an illustration of the diffraction pattern with a broad feature and 
the broad feature detected by methods consistent with the present invention. 

Figure 19 is an illustration of preferred orientation or particle statistics. 

Figure 20 is an illustration of the first smoothed, baseline corrected 
diffraction pattern compared to a second smoothed, baseline corrected diffraction pattern 
consistent with the present invention. 

Figure 21 is an illustration of the missing Group 1 and Group 2 peaks found 
in the first smoothed, baseline corrected diffraction pattem but missing in the second 
smoothed, baseline corrected diffraction pattem consistent with the present invention. 

Figure 22 is an illustration of the missing Group 1 and Group 2 peaks found 
in the second smoothed, baseline corrected diffraction pattem but missing in the first 
smoothed, baseline corrected diffraction pattem consistent with the present invention. 

Figures 23a and 23b illustrate the results of a hierarchical cluster analysis 
generated by methods consistent with the present invention. 

Figure 24 is a flowchart of the operation of the intensity envelope 
comparison methodology of the Analysis System consistent with the present invention. 

Figure 25 is a flowchart of the operation of the intensity envelope 
comparison pre-processing methodology consistent with the present invention. 

Figure 26 is a flowchart of the intensity matching method consistent with the 
present invention. 

Figure 27 is a graph of a sample diffraction pattem and a calculated pattem 
resulting from the least squares fitting of all other patterns consistent with the present 
invention. 

Figure 28 is a graph of a plurality of diffraction patterns analyzed according 
to the intensity envelope comparison method and the resulting least squares analysis 
consistent with the present invention. 

Figure 29 illustrates a disorder simulation algorithm consistent with the 
principles of the present invention. 

Figure 30 illustrates a flowchart of the generation of the simulated disordered 
pattem from the received peak list. 
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Figure 3 1 is a block diagram of an Analysis System consistent with the 
present invention. 

Detailed Description 

Reference will now be made in detail to the present exemplary embodiments 
consistent with the invention, examples of which are illustrated in the accompanying 
drawings. Wherever possible, the same reference numbers will be used throughout the 
drawings to refer to the same or like parts. 

The clustering of measured diffraction patterns from polycharacteristic 
materials, noncrystalline materials, or mixtures is an example of clustering objects where 
the measure of similarity is not obvious and is an area where HCA has not previously been 
applied. Many experimental variables (sample preparation, instrumental variation, random 
noise) make the selection of a robust measure of similarity for diffraction pattems a 
complex procedure. 

Based upon many years of experience in manually clustering 'similar' 
diffraction pattems, a set of Heuristic laws has been derived that allows direct quantification 
of the similarity between two or more measured diffraction pattems. This measure of 
similarity may then be used with an HCA procedure to identify groups of relatively similar 
diffraction pattems. 

At least two distinct measures of similarity may be implemented for the 
purpose of clustering diffraction pattems. The first may determine the similarity of 
diffraction pattems according to the 'similarity' of the measured diffraction peaks, while the 
second may determine the similarity of diffraction pattems according to the 'similarity' of 
the measured intensity envelope. 

Diffraction pattems from crystalline material with 'similar' crystallographic 
unit cell parameters may generate diffraction pattems with 'similar' measured diffraction 
peak positions. The more similar the crystallographic unit cell parameters the more similar 
the measured diffraction peak positions. 

Crystalline material wdth 'similar' molecular or atomic packing motifs may 
generate diffraction pattems with 'similar' measured intensity envelopes within the limits 
imposed by sample preparation variables. The more similar the molecular or atomic packing 
motifs, the more similar the measured intensity envelopes. 
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Clustering measured crystalline diffraction patterns based upon the similarity 
of the measured peak positions and intensities allow, therefore, the grouping of samples 
containing predominantly the same crystalline polymorph. That is, the same 
crystallographic unit-cell, the same point group and space group, and the same 
molecular/atomic-packing motif. Samples containing predominantly the same polymorph 
are most likely to exhibit similar chemical behavior. 

Using only the measured intensity envelope as a measure of similarity 
between diffraction pattems allows for the grouping of samples that are iso-structural. 
Having similar molecular/atomic-packing motif but differing unit cell parameters 
characterize iso-structural materials. The difference between one iso-structural material and 
another is a difference in unit cell parameters (a symmetry translation) that will not affect 
the chemical properties. Like samples containing the same polymorphs, samples that are 
iso-structural will exhibit similar chemical properties. 

Consistent with the principles of the present invention, systems may be 
utilized, for example, to identify new solid forms of compounds or elements. They may be 
used, for example, to identify new solid forms of known drugs. These new solid forms of 
drugs may provide improved properties, such as improved stability, solubility, 
bioavaliability, or handling properties. In order to find a new solid form of a drug, the drug 
may be crystallized in many different ways. For example, hundreds or thousands of 
samples of the drug may be generated by crystallization or solidification using different 
solvents, different temperatures, different humidities, or different pressures. Those skilled 
in the art will appreciate the variety of approaches that may be taken to generate a wide 
variety of solid forms of a material. 

Samples of a material maybe, for example, in a crystalline, disordered 
crystalline, polycrystalline, non-crystalline, amorphous, disordered, microcrystalline, 
nanocrystalline, partially amorphous, partially crystalline, semisolid, crystal mesophases, or 
glassy form or mixtures of these forms. Once the samples have been generated, diffraction 
instrumentation may be utilized to analyze the samples and produce diffraction pattems. 
Diffraction pattems may be, for example, neutron diffraction pattems. X-ray diffraction 
pattems, or electron diffraction pattems. Consistent with the present invention, diffraction 
pattems of the samples are compared. The results of the comparison of the pattems may be 
analyzed using hierarchical cluster analysis (HCA) to group the pattems into similar 
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clusters. Further information on hierarchical cluster analysis may be found in C. Olson, 
"Parallel Algorithms For Hierarchical Clustering/' Para/fe/ Computing, 21:1313-1325, 
1995. Consistent with the principles of the present invention. X-ray diffraction (XRD) and 
HCA may be combined to find new solid fomis of materials, including but not limited to 
new solid forms of drugs. 

Figure 1 is an illustration of a system consistent with the present invention in its 
operating environment. Diffraction instrumentation 100 analyzes samples yielding a pattern 
130. Pattern 130 is a graph with degrees along the X-axis and magnitude along the Y-axis. 
Instrumentation 100 may include any type of instrumentation by, for example, 
manufacturers such as Shimadzu, Bruker, or INEL in the case of X-ray powder diffraction. 
Pattem 130 is transferred as pattem data to Analysis System 110. The transfer may be by 
transfer of storage media, such as floppy disk, hard disk, tape, or flash ram, or by electronic 
means, such as over a Local Area Network, Wide Area Network, the Intemet, or point-to- 
point communication via a modem, Firewire, USB, serial, or parallel connection. 

Analysis System 1 10 may be operated by an Operator 120 or may function 
without the intervention of an operator. Analysis System 1 10 may perform matching on the 
patterns in order to quantify the similarity between at least a first pattem and a second 
pattem. Consistent with the present invention, each pattem may be compared to every other 
pattem received to generate a quantitative similarity between each pattem and every other 
pattem. Pattems that are identical may be ignored and patterns composed of mixtures of 
other pattems may be determined. 

Analysis System 110 may match pattems by several methods, including: 
identifying peaks within the pattems and matching the pattems based on the identified 
peaks; or matching the intensity envelopes of the pattems. Graph 140 illustrates matching 
two pattems based on identified peaks. Analysis System 110 may quantify the similarities 
between the pattems. The pattem matching scores or similarity scores may be used to 
perform HCA analysis on the pattems to yield a Dendrogram 1 50. Dendrogram 1 50 
illustrates the grouping of pattems into clusters of similar forms. This cluster analysis will 
group similar pattems together for further use. 

Figure 2 is a flowchart of the operation of the Analysis System 110 
consistent with the present invention. Analysis System 110 may perform the similarity 
determination and HCA analysis method 200 through one or more of the following 
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methods: receives the patterns, pre-processes the patterns, matches the pattems to generate 
a similarity score between the pattems, and performs hierarchical cluster analysis on the 
pattems based on the similarity scores. At stage 210, the method 200 receives two or more 
pattems. These pattems may be in the form of a graphical image converted to a flat data file 
through image scanning and analysis or may arrive in a flat data file, such as an ASCII 
comma or tab delimited format, SQL data, or spreadsheet data. 

At stage 220, each pattem may be pre-processed. The pre-processing stage 
220 may vary depending on the pattem matching technique utilized later in method 200. 
The pre-processing stage 220, generally, may massage the data to normaHze the data, 
remove instrumentation errors and variations, and analyze the data for results used later in 
method 200. 

At stage 230, method 200 may match the pattems to each other to determine 
their similarities. Stage 230 may match peaks within the pattems to determine similarity or 
may match the general intensity envelopes of the pattems to determine similarity. Peak 
matching is useful for identifying similar imit cells and crystal symmetry. Intensity 
envelope matching is usefiil for identifying isostructures of the crystalline forms and 
clustering disordered forms with ordered forms. 

At stage 240, the results of the matching, i.e. the similarity scores, are 
utiHzed to perform hierarchical cluster analysis (HCA), described in more detail in the 
following paragraphs. Initially, HCA defines every pattem as a separate cluster. The two 
most similar clusters are aggregated into a cluster. The clustering then repeats until all 
clusters are joined together. The resulting clustering is displayed in a tree structure, known 
as a dendrogram. Figures 23 a and 23b, to be discussed later, illustrate an exemplary 
dendrogram. The vertical axis displays each sample. Pattems that are similar cluster 
together toward the left portion of the horizontal axis. As similarity diverges, the clusters 
are grouped together toward the right portion of the horizontal axis. Thus, moving from left 
to right, the horizontal axis displays lesser degrees of similarity. Similarity is relatively 
scaled so that a similarity of 1 .0 denotes a perfect match with perfect similarity and a 
similarity of 0.0 denotes the poorest match. 

While those skilled in the art will understand HCA, a short description of a 
basic HCA method follows: Starting with a set of N items (consistent with the present 
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invention, N patterns), and an NxN similarity matrix describing the relative similarity of 
each item to each other item, the basic process of HCA is: 

1 . Initially assign each item to its own cluster, producing N clusters, 
each containing one item. Let the similarities between the clusters equal the similarities 
between the items they contain. 

2. Find the most similar pair of clusters and merge them into a single 
cluster, resulting in one less cluster (for an initial total of N-1 clusters). 

3 . Compute similarities between the new cluster and each of the 
remaining old clusters. 

4. Repeat steps 2 and 3 until all items are clustered into a single cluster 
of size N. Each merge operation can be considered as a branch in a tree of clusters. As 
previously explained, this tree is called a dendrogram and has its root in the final cluster 
that contains all N items. The leaves of the tree are the initial N single item clusters. 

Step 3 may be done in different ways, resulting in different cluster distance 
metrics. Some of the most commonly used cluster distance metrics are: single-link, 
complete-link and average-link. In single-link clustering (also called the minimum method), 
the similarity between two clusters is equal to the greatest similarity from any item in one 
cluster to any item in the other cluster. In complete-link clustering (also called the 
maximum method), the similarity between two clusters is equal to the smallest similarity 
from any item in one cluster to any item in the other cluster. In average-link clustering, the 
similarity between tvs^o clusters is equal to the average similarity from any item in one 
cluster to any item in the other cluster. HCA may be understood in more detail in the 
following references, each of which is incorporated by reference: Borgatti, S.P., "How to 
Explain Hierarchical Clustering", Connections, 17(2):78-80, 1994; Johnson, S.C., 
"Hierarchical Clustering Schemes" P^ycAomefn^a, 2:241-254, 1967; Olson, C, "Parallel 
Algorithms For Hierarchical Clustering", Parallel Computing, 21:1313-1325, 1995. 

HCA stage 240 may provide an interface that allows the user to intersect a 
number of branches of the tree, where each intersected branch corresponds to a cluster 
(form) containing patterns with similarity greater than the intersection number. The user 
interface may be in the form of a vertical bar 23 1 0. Thus, the form bar segments the 
dendrogram into a number of clusters, where the number of clusters or forms will vary 
depending on the horizontal positioning of the form bar. HCA stage 240 may select an 
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optimum position for the form bar, or cutoff similarity, based on the similarities determined 
in stage 230. The optimum position of the form bar may be selected at a point between 0.0 
and 1 .0 and may be adjusted up or down based on the similarity of the patterns. 

In addition, HCA stage 240 may provide for a post-HCA mixture analysis. 
In post-HCA mixture analysis, representative peaks for a first cluster may be compared to 
combinations of two or more clusters searching for combinations of clusters having peaks 
that match the first cluster. This may be repeated across all clusters, flagging mixtures for 
the operator. For example, in an HCA analysis yielding 10 clusters, the first cluster may be 
compared to various combinations of the 2""* through 10^^ clusters searching for matching of 
characteristic peaks of the first cluster with characteristic peaks of the combined clusters. 
This may continue for each of the 2"*^ through 10^^ clusters. 

In addition, stages 230 and 240 may be performed separately based on the type of 
patterns analyzed. For example, crystalline forms may be only matched against crystalline 
forms, amorphous or other forms that generate broad features may be only matched against 
other forms that generate broad features, and mixtures of broad feature and crystalline forms 
may be only matched against mixtures of broad feature and crystalline forms. Also, the 
pattern matching algorithm used may vary depending on the type of peak. For example, the 
peak matching algorithm may be utilized with crystalline forms, and the envelope matching 
algoritiim may be utilized with forms that generate broad features. 

Figure 3 is a flowchart of the operation of the peak comparison methodology of the 
Analysis System consistent with the present invention. At stage 210, a pattern is received as 
previously described. At stage 310, the pattern may be pre-processed. Pre-processing the 
pattern may comprise one or more of: correcting for baseline shift, smoothing the pattern, 
removing broad features, computing variance, and detecting the potential presence of 
preferred orientation and particle statistics (any reference to preferred orientation and 
particle orientation shall presume to be interpreted as both the conjunctive and disjimctive 
form). Pre-processing stage 3 10 is further explained with reference to Figure 4 that follows. 
At stage 320, the peaks of the pattern may be detected, listed, and categorized. At stage 
330, the listed and categorized peaks of the pattern may be compared to the listed and 
categorized peaks of the other sample patterns. The result of stage 330 may be a measure of 
the similarity between the pattern and other patterns. Finally, as previously described, the 
similarity measure of the patterns is used to perform HCA analysis at stage 240. 
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Figure 4 is a flowchart of the peak comparison pre-processing method 310 
consistent with the present invention. At stage 405, the pattern intensities may be 
normalized to a scale of [0,1] to avoid common potential presence of preferred orientation 
and particle statistics effects. In addition, the pattern may be truncated to a standard x range 
used in the pattern matching, for example 2.5° to 40°. Data outside of the truncated range 
may be discarded. At stage 410, the baseline of the nomialized, truncated, pattern is 
detected and the pattern may be baseline corrected. Figure 10 illustrates a raw input pattern 
1020. Notice that there is a general shift in the pattern from the upper left to the lower right. 
This is a baseline shift. Figure 1 1 illustrates the detected baseline 1110 of pattern 1020. 
Stage 410 may examine the local minima across a sliding window of pattern 1020 to 
determine baseline 1 1 10 or employ a digital filter algorithm for a similar purpose. 
Following baseline correction, a baseline corrected pattern 1210, illustrated in Figure 12, 
results. 

At stage 420, the pattern is smoothed. Any of a number of smoothing 
algorithms or filters may be used to smooth the pattern, for example, Savitzky-Golay 
smoothing or digital filtering. Figure 13 illustrates a pattern 1310 prior to smoothing. 
Figure 14 illustrates a smoothed pattern 1410 based on the pattern 1310. 

Smoothing and baseline correction may be used together during pre- 
processing to yield a smoothed, baseline corrected pattern, such as pattern 1510 in Figure 
15. 

At stage 430, any broad features of the pattem may be detected and removed. 
Broad features might be produced by amorphous components, disordered crystalline forms, 
or parasitic scatter form the main beam. Pattem 1710 of Figure 17 illustrates a pattem with 
a broad feature. Stage 430 detects the component, illustrated as component 1810, within 
pattem 1710. Stage 430 may detect the component 1710 by utilizing a heavy and repeated 
smoothing filter to pattem 1710. Any patterns with broad features detected may be 
segregated out and matched and clustered separately from patterns that are crystalline and 
without broad features. 

At stage 440, the pre-processing method 310 determines the variance of the 
pattem. This variance is stored for later use by other portions of the algorithm 300, 
specifically, for example, for use in peak detection. 
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At stage 450, the pre-processing method 310 may detect the potential 
presence of preferred orientation and particle statistics of the sample from the pattern. 
Preferred orientation and particle statistics is detected if a few peaks are abnormally high 
when compared to the rest of the peaks. In addition, the noise level of the pattern (possibly 
represented by the variance) may be considered in making this determination as patterns 
with potential presence of preferred orientation and particle statistics tend to exhibit low 
levels of noise after normalization. The potential presence of preferred orientation and 
particle statistics is flagged and parameters in the rest of the method, for example, the peak 
detection algorithm, may be adjusted based on this flag. Additionally, the location of these 
peaks may be stored. For example, pattern 1910 might reveal a potential presence of 
preferred orientation and particle. In addition, noise may be detected and used to adjust 
pattern matching parameters. 

Figure 5 is a flowchart of the peak detection method 320 consistent with the 
present invention. At stage 510, the characteristic peaks are detected. These peaks are 
points on the pattem that are greater than a minimum height, greater than a minimum width 
and with a degree of lateral space from their nearest neighbors. Stage 5 1 0 is more folly 
explained later with reference to Figure 6. At stage 520, probability scores are assigned. 
Probability scores may be based on the height, width, and neighbors of the characteristic 
peaks. Stage 520 yields a list of characteristic peaks and scores ranging, for example, 
between 0 and 100%. Stage 520 is more folly explained with reference to Figure 7. 

At stage 530, the characteristic peaks may be allocated into discrete groups 
based on their associated probability score. For example, major peaks may be grouped into 
Group 1, lesser peaks into Group 2, and so on through Group 4 (minor peaks). Group 1 
may comprise characteristic peaks with scores greater than 75%; group 2 may comprise 
characteristic peaks with scores greater than 50% to 75%; group 3 may comprise 
characteristic peaks with scores greater than 25% to 50%; and group 4 may comprise 
characteristic peaks with scores between 0% and 25%. Figure 16, discussed later, illustrates 
characteristic peaks placed into groups. Those skilled in the art would appreciate that fewer 
or lesser than four groups may be utilized and ranges may vary in discretely allocating the 
peaks. 

Figure 6 is a flowchart of the characteristic peak determination method 510 
consistent with the present invention. At stage 610, the process begins at a first point within 
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the pattern. Every single data point may be processed through the methodology of stages 
620 - 660, or to speed up the process fewer points may be processed, for example every 
other point may be processed In general, characteristic peak determination method 5 10 is 
looking for peaks of a significant amplitude and width relative to the pattern. 

At stage 620, the method looks to see if there are any points of the same or 
greater magnitude within x degrees of the examined point. If so, processing proceeds to 
stage 660 and the next point is selected. If not, the point appears to be a local maximum and 
flow proceeds to stage 630. At stage 630, the height and width of the candidate point is 
determined by examining the points of inflection on either side of the candidate point. 

At stage 640, if the peak, or candidate point, has a height greater than a 
minimum height and a width greater than a minimum width, the candidate point is stored in 
a list or table as a characteristic peak at stage 650. In addition to the candidate point, the 
two inflection points may be stored as well, signifying the beginning, top, and end of the 
peak. The variance determined during the pre-processing stage may be used to 
automatically determine minimum height requirements. Minimum height may also be 
manually set. Minimum peak width may be manually set or may be automatically set based 
on instrument resolution. 

At stage 660, the next point is selected until stage 510 is complete. 

Figure 7 is a flowchart of the probability assignment method 520 consistent 
with the present invention. At stage 710, the processing begins and recurs through stages 
720-760 until all desired characteristic peaks have been scored with a probability 
assignment. At stage 720, points are assigned to the peak based on the height of the peak. 
Scores may be assigned based on the multiple of threshold values of the height of the peak. 
The threshold value may be manually assigned or determined based on the previously 
computed variance (noise level). The threshold value may also be based on the presence of 
preferred orientation and particle statistics. For example, a peak that is five thresholds high 
may be given a height score of 50%. Conversely, a peak that does not meet a minimum 
height threshold multiple can incur a negative height score. 

At stage 730, points are assigned to the peak based on the width of the peak. 
For example, for every .05 degree in width of the peak past a certain threshold, the width 
score may be given a +5%. So, in this example, a peak that is 1 degree wide may be given a 
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Width score of 100% [(l/.05)=20x5%-100%]. Again, if the width is below a certain 
threshold a negative width score may be assigned. 

At stage 740, points are assigned to the peak based on the neighborhood of 
the peak. For example, if there is nothing in the neighborhood of the peak, for example 
within .2 degrees, then the peak may be given a neighborhood score of +30%. If there is 
something on one side of the peak but not another, the peak may be given a neighborhood 
score of +15%. But, if the peak is in a crowded neighborhood, i.e. peaks on either side of 
the peak, the neighborhood score might be -30%. 

As will be appreciated by those skilled in the art, various weightings and 
scores may be assigned to the height, width, and neighborhood scoring factors. Other peak 
characteristics may also be used for scoring. 

At stage 750, the scores for the height, width, and neighborhood may be 
summed and stored in association with the peak in the characteristic peak list or table. At 
stage 760, the next characteristic peak is selected and analyzed through stages 710-750 until 
method 520 is complete. Then, flow proceeds to stage 530 (Figure 5) for placing the 
characteristic peaks into groups based on the scores. 

Figure 8 is a flowchart of the peak pattern matching method 330 consistent 
with the present invention. After receipt, optional pre-processing, and determining the 
characteristic peaks for all patterns upon which a user may want to run HCA, each pattern 
may be compared to other patterns to determine a similarity. Stages 810, 820, 840, and 850 
operate to compare each pattern to every other pattern. Stage 830 performs the comparison 
by comparing each characteristic peak in Sample i with characteristic peaks in Sample j to 
look for matches. The result of the comparison is a similarity score. 

Figure 9 is a flowchart of the peak comparison method 830 consistent with 
the present invention. Consistent with the present invention, peak comparison method 830 
compares the Group 1 and Group 2 peaks in Sample i, a first sample, tp see if there are 
comparable characteristic peaks in Sample j, a second sample. Sample i Group 1 peaks may 
be found if there are corresponding Group 1, 2, or 3 characteristic peaks in Sample j. 
Sample i Group 2 peaks may be foxmd if there are corresponding Group 1, 2, 3, or 4 peaks 
in Sample j. A corresponding peak is one at the same degree position along the X axis. The 
same degree position may range from tight, e.g. within .1 degree, to loose, e.g. within 1.5 
degrees. This may be set automatically based on the resolution of the instrumentation or 
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manually set. Where Group 1 peaks are missing, a first penalty for similarity may be 
applied. Where Group 2 peaks are missing, a second penalty for similarity may be applied. 
The penalties are totaled to yield a value representing the similarity between the patterns of 
Sample i and Sample j. A similarity of 0 would be a perfect match. 

Stages 905 - 925 represent the analysis of Group 1 peak matching. Stages 
930 - 950 represent the analysis of Group 2 peak matching. At stage 905, the first Group 1 
peak of Sample i is selected. At stage 910, a check is made to determine if there are any 
Group I, 2, or 3 peaks in Sample j that correspond to this peak of Sample i. If there is, at 
stage 925 no penalty is imposed and processing continues at stage 920 where the next 
Group 1 peak is selected. If there are no matching peaks. At stage 915, a penalty is 
imposed to the similarity score of Sample i to Sample j. This penalty may be, for example, 
.6. At stage 920, the next Group 1 peak is selected xmtil all Group 1 peaks of Sample i are 
complete. 

At stage 930, the first Group 2 peak of Sample i is selected. At stage 935, a 
check is made to determine if there are any Group 1, 2, 3, or 4 peaks in Sample j that 
correspond to this peak of Sample i. If there is, at stage 950 no penalty is imposed and 
processing continues at stage 945 where the next Group 2 peak is selected. If there are no 
matching peaks, at stage 940, a penalty is imposed to the similarity score of Sample i to 
Sample j. This penalty may be, for example, .3. At stage 945, the next Group 2 peak is 
selected until all Group 2 peaks of Sample i are complete. Method 830 ends at stage 955. 

During peak comparison, the algorithm may treat overlapped peaks, split peaks (two 
peaks having been bifurcated into two peaks with a depression in between) and shoulder 
peaks (a first greater peak having a second lesser peak sprouting prior to the first peak's tme 
inflection point), as multiple peaks if they are present in more than one pattern. If one 
pattem exhibits a split peak and one pattern exhibits a peak with a shoulder at the same 
position, they may be matched. 

In addition, the peak matching algorithm may ignore, and choose not to perform 
matching, on high angle (high 2Theta) Group 2 peaks. For example, the 2Theta cutoff point 
may be determined by the equation, 2Theta_Cut_Off = 2.0*asin(5.0*sin(2Theta_l/2.0)), 
where 2Theta__l is the measured 2Theta angle of the lowest angle diffraction peak. 

In addition, the algorithm may detect and flag missing famiUes of peaks with 
common 'd' values, indicating the possible presence of preferred orientation. If such peaks 
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are detected they may be included in the pattern matching as if they were physically present 
in the pattern. If a peak is missing at a particular 2Theta value, then the program looks for 
missing peaks at 2Theta values given by 2 asin(2Theta n/2) where n takes the values 1, 2, 3, 
4. 

A user may intervene in the method 830 to X-shift by a real number of degrees 
forward or backward to attempt to better align patterns for matching. X-shifting may be 
necessitated by instrumentation errors or variations. The method 830 may also be set to 
automatically perfomi some X-shifting to look for a better match, for example, if the 
algorithm determines that there is a constant X-shift between the peaks of the two patterns. 

The resulting scores are used in the HCA described with reference to HCA 
method 240. Notice that method 830 yields scores of 0.0 to uafmity, where 0.0 denotes a 
perfect match. Prior to the HCA the similarity scores are all scaled from 1 .0 to 0.0, where 
1.0 denotes a perfect match. Initially, HCA defines every pattern as a separate cluster. The 
two most similar clusters are aggregated into a cluster. The clustering then repeats until all 
clusters are joined together. The resulting clustering is displayed in a tree structure, known 
as a dendrogram. Figures 23a and 23b, to be discussed later, illustrate an exemplary 
dendrogram. The vertical axis displays each sample. Patterns that are similar clustered 
together toward the left portion of the horizontal axis. As similarity diverges, the clusters 
are grouped together toward the right portion of the horizontal axis. Thus, moving from left 
to right, the horizontal axis displays lesser degrees of similarity. 

HCA stage 240 may provide a form bar, a vertical line that intersects a 
number of branches of the tree, where each intersected bar represents a fomi. Thus, the 
form bar segments the dendrogram into a number of clusters, where the number of clusters 
or forms will vary depending on the horizontal positioning of the form bar. HCA stage 240 
may select an optimum position for the form bar based on the similarities determined in 
stage 230. Those skilled in the art will appreciate that many other types of user interfaces 
for segmenting the dendrogram into clusters can be envisioned. 

Figure 16 is an illustration of the smoothed, baseline corrected diffraction 
pattern with the peaks detected and categorized according to methods consistent with the 
present invention. Smoothed, baseline corrected pattern 1510 has been broken down into 
characteristic peaks categorized in groups. Group 1 characteristic peaks 1610 are the largest 
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peaks in the pattern and carry the most weight in matching. Group 2 characteristic peaks 
1620 are standard peaks in the pattern and carry less weight. 

Figure 20 is an illustration of the first smoothed, baseline corrected 
diffraction pattern compared to a second smoothed, baseline corrected diffraction pattern 
consistent with the present invention. Some of the peaks of the first pattern are missing 
from the second pattern, and some of the peaks of the second pattern are missing from the 
first pattern. 

Figure 21 is an illustration of the missing Group 1 and Group 2 peaks found 
in the furst smoothed, baseline corrected diffraction pattern but missing in the second 
smoothed, baseline corrected diffraction pattem consistent with the present invention. 
There is a single missing Group 1 major peaks 21 10 which would cause a .66 penalty to the 
similarity score. There are three missing Group 2 standard peaks 2120 which would cause a 
penalty of .9 (.3 x 3). This would result in a total similarity of 1.56 of the first compared to 
the second . 

Figure 22 is an illustration of the missing Group 1 and Group 2 peaks found 
in the second smoothed, baseline corrected diffraction pattem but missing in the first 
smoothed, baseline corrected diffraction pattem consistent with the present invention. 
There are three missing Group 1 major peaks 2210 which would cause a 1.98 (.66 x 3) 
penalty to the similarity score. There is one missing Group 2 standard peak 2220 which 
would cause a penalty of .3 This would result in a total similarity of 2.28 of the second 
compared to the first . If these similarity scores are totaled, the total two-way similarity 
would be 2.28 + 1.56 - 3.84. 

As previously mentioned, peak matching is usefiil for identifying similar unit 
cells and crystal symmetry. However, intensity envelope matching is usefiil for identifying 
isostructures of the crystalline forms and clustering disordered forms with ordered forms. 

Figure 24 is a flowchart of the operation of the intensity envelope 
comparison methodology of the Analysis System consistent with the present invention. At 
stage 210, a pattem is received as previously described. At stage 2410, the pattem may be 
pre-processed. Pre-processing the pattem may comprise one or more of: scaling the pattem 
into a common measurement range; scaling the pattem into a common step size; 
normalizing the pattem; and smoothing the pattem. Intensity envelope pre-processing stage 
2410 is fiorther explained with reference to Figure 25 that follows. At stage 2530, the 
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intensity envelope of the pattern may be compared to the intensity envelopes of the other 
sample pattems. The result of stage 2530 may be a measure of the similarity between the 
pattern and other pattems. Finally, as previously described, the similarity measure of the 
pattems is used to perform HCA analysis at stage 240. 

Figure 25 is a flowchart of the operation of the intensity envelope 
comparison pre-processing methodology 2410 consistent with the present invention. At 
stage 2510, the pattern maybe smoothed. At stage 2520, the pattem is processed to be in a 
common measurement range with the other pattems. At stage 2530, the pattem is processed 
to be a common step size. Instmmentation may vary in step size, for example one 
instrument may be .02 degrees and another instmment .05 degrees. At stage 2540, the 
pattem is normalized. In this stage the weight, or integrated intensity, is normalized or 
standardized across all pattems. 

Figure 26 is a flowchart of the intensity matching method 2420 consistent 
with the present invention. After receipt and pre-processing, each pattem may be compared 
to all other pattems to determine a similarity based on the intensity envelope. Stages 2610, 
2630, and 2650 operate to compare each pattem with all other pattems. Stage 2630 
perfomis the comparison by comparing the general intensity envelope of Sample i with the 
general intensity envelope of all other samples, Samples 1 to N where N is the number of 
samples, using a least squares fitting algorithm. The results of the comparison are a 
percentage score of each sample of Samples 1 to N present in Sample i. As previously 
described, the similarity score is used in the HCA stage 240. 

Figure 27 is a graph of a measured diffraction pattem 2710 and a calculated 
pattem 2720 resulting from the least squares fitting of all other pattems consistent with the 
present invention. The measured pattem 2710 has been pre-processed to normalize the 
pattems for comparison. 

Figure 28 is a graph of a plurality of diffraction pattems analyzed according 
to the intensity envelope comparison method and the resulting least squares analysis 
consistent with the present invention. Pattem 2810 matches pattem 2710 with a similarity 
of 56%; pattem 2820 matches pattem 2710 with a similarity of 16%; and pattem 2830 
matches pattem 2710 with a similarity of 0%. These similarity scores may be used for HCA 
to form clusters as previously described. 
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In addition, a similar method may be utilized to perform quantitative analysis 
of samples containing either mixed crystalline phases or mixed crystalline and disordered 
phases. The quantification of mixed crystalline and disordered phases is called percentage 
crystallinity analysis. For example, a diffraction pattern from a mixture will contain within 
it the diffraction pattems corresponding to each of the phases present in the mixture. 
Utilizing the above methodology, the presence, by percent weight, of each of the phases 
within the mixture may be analyzed and represented as a weight percent similar to the 
representation of the above similarity percentage. In addition, disordered forms, generated 
as described below, may be presented to the above algorithm for the analysis of the percent 
crystallinity. 

Prior art methods may fail to match forms if there is significant disorder 
present. In other words, forms that should be clustered together may be clustered apart 
because of disorder. In order to match crystalline forms that are disordered, a disorder 
simulation algorithm has been developed to simulate disorder forms that may be compared 
to measured pattems to identify relationships. Through this method, disordered crystalline 
or polymorph forms may be matched with more ordered crystalline or polymorph forms. 

Figure 29 illustrates a disorder simulation algorithm 2900 consistent with the 
principles of the present invention. At stage 2910, a peak list, as previously described, is 
received where the peak list may be from a known, ordered, crystalline form. The peak list 
may be imported as a data file or generated from the previously described pattern matching 
algorithms, for example, as described with reference to Figure 6. An operation may also 
manually enter the peak list. 

In addition, disorder simulation algorithm 2900 may calculate and generate a 
peak list based on a known crystal structure. For calculated pattems, algorithm 2900 may 
apply a Lorentz polarization factor to simulate the characteristics of a peak list generated by 
an X-Ray diffraction instrument. The Lorentz polarization factor may be selected based on 
the characteristics of the particular X-Ray diffraction instrument used to gather data from 
other pattems of interest. The Lorentz polarization factor may be applied to the peak list 
prior to further calculations. 

An example of the use of the Lorentz polarization factor for Theta-2Theta 
scans using a Bragg-Brentano geometry without monochromator crystal may be: 
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l + cos^26> . 
sm(,9)sin(2i9) ' 

where 2Theta is the measiirement angle of the diffraction pattern. 
At stage 2920, the simulated disordered pattern is generated using the peak 
list. This will be discussed further with reference to Figure 30. 

At stage 2930, the simulated disordered pattern is compared to the measured 
pattems. This may be by using the previously described matching algorithms and 
incorporating the simulated disordered pattem into the matching or HCA engine, or by 
visual inspection (overlaying the simulated pattem over the measured pattem). By 
incorporatmg the simulated disordered pattem into the matching algorithms, measured 
disordered pattems can be grouped along with crystalline pattern forms, if that is desired, 
facilitating the work of the operator during a polymorph or salt screen. 

Figure 30 illustrates a flowchart of the generation of the simulated disordered 
pattem 2920 from the received peak list. At stage 3010, instrument parameters may be 
simulated. Because the simulated disordered pattems will be compared to measured 
pattems, stage 3010 may generate a crystalline pattem as measured by a selected 
instmment. The instmment function may be modeled by one or more parameters, for 
example, peak shape, background, and noise. The chosen peak shape may be a split Pseudo 
Voigt with independently variable asymmetry and weighting factors. Continuously variable 
power laws may model the peak width, asymmetry, and weighting factor, for example, as a 
function of 2Theta (the measurement angle). An exemplary form of the peak width 
parameter may make use of the well-known CagUotti formula: Peak Full Width = SQRT (U 

tan(Theta) tan(Theta) + V tan(Theta) + W), where U is . . . , V is . . . , and W is 

The asymmetry and Pseudo Voigt weighting factors may follow similar 
power laws as a function of 2Theta. 

The noise parameter may make use, for example, of Poisson statistics where 
the noise distribution 1 sigma is the square root of the X-ray intensity at each point. 

Simulation of the instrumental function may also make use of the spectral 
signature of the X-ray source. For a fixed tube or rotating anode systerh, this may imply the 
addition of a K-alpha 2 wavelength component to the simulated pattem. For synchrotron 
data, for example, this data may not be needed. The algorithm may utilize a table 
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comprising one or more standard anode materials with their respective default K-alpha 1 
and K-alpha 2 X-ray wavelengths. 

At stage 3020, one or more operator defined microstructure parameters may 
be received. These parameters may include, for example: crystallite size, D, in Angstroms, 
typically between 500 and 20 Angstroms for example; microstrain, E, in percent, typically 
between .1% and 4% for example; thermal strain, alpha, in Angstroms, typically between .1 
and .2 Angstroms for example; and residual strain, E, in Angstroms, typically between .1 
and .2 Angstroms. During simulation at stage 3030, crystallite size and microstrain may 
cause broadening of the diffraction peak. Thermal strain may cause a 2Theta dependent 
dampening of the intensity, and residual strain may cause peak movement. 

For each set of one or more of these input microstructure parameters, a 
disordered diffraction pattem may be simulated, where the simulation includes one or more 
instrumental factors. 

At stage 3030, the material disorder is modeled based on the operator defined 
microstructure parameters received. Application of the microstructure parameters may be 
applied isotropically without knowledge of the underlying crystalline structure. As 
simulated patterns may be combined, it is possible that stage 2920 can be used to model 
complex anisotropic disorder through sequential calculations. 

Crystal size may be modeled in terms of the Scherrer equation, well known 
to those skilled in the art, 

KX 

PeakBroadeningiradians) = ; 

Dcos{ff) 

where K is the Scherrer constant (approximately .9), lambda is the X-ray 
wavelength in Angstroms, and D is the crystallite size in Angstroms. 

Microstrain may be modeled, for example, using the strain component of the 
Williamson and Hall model. 

PeakBroadening{radians) ~ 4£'tan(^) 

These two peak broadening parameters, crystal size and microstrain, may be 
combined with the instrument profile using a Gaussian approximation, for example: 

FinalPeakWidth = ^HlHl + H2H2 + H3H3 ; 
where HI is the instrumental profile previously described, H2 is the 
crystallite size profile, and H3 is the microstrain profile. 
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Thermal strain may be modeled, for example, by the Debye Waller thermal 
factor that damps the measured intensities preferentially at high 2Theta values. The form of 
the Debye Waller factor may be modified to represent random strain within the crystal unit 
cell. 

DBWfactor = e" ^ 

Residual strain may cause peak movement. The form of the peak movement 
may be very similar to the microstrain peak broadening component. 
PeakMovement(radians) = -2£'tan(^) 

Figure 3 1 is a block diagram of an Analysis System 110 consistent with the 
present invention. As illustrated in Figure 31, a system environment of an Analysis System 
110 may include a display 31 10, a central processing unit 3120, an input/output interface 
3 130, a network interface 3 140 and memory 3150 coupled together by a bus. Analysis 
System 1 10 is adapted to include the functionality and computing capabilities to receive 
diffraction data from Instrumentation 100 and to pre-process the diffraction data, match the 
diffraction data between samples, and perform HCA on the results of the sample matching 
scores. The input, output, and monitoring of the system may be provided on display 3110 
for viewing. 

As shown in Figure 31, Analysis System 1 10 may comprise a PC or 
mainframe computer for performing various functions and operations consistent with the 
invention. Analysis System 110 may be implemented, for example, by a general purpose 
computer selectively activated or reconfigured by a computer program stored in the 
computer, or may be a specially constructed computing platform for carrying-out the 
features and operations of the present invention. Analysis System 110 may also be 
implemented or provided with a wide variety of components or subsystems including, for 
example, one or more of the following: one or more central processing units 3120, a co- 
processor, memory 3150, registers, and other data processing devices and subsystems. 
Analysis System 110 may also communicate or transfer XRD sample data, matching scores, 
HCA results or other data via I/O interface 3 130 and/or network interface 3 140 through the 
use of direct connections or communication links to other elements of the present invention. 
For example, a firewall in network interface 3 140 prevents access to the platform by 
unpermitted outside sources. 
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Alternatively, communication within Analysis System 110 can be achieved 
through the use of a network architecture (not shown). In the altemative embodiment (not 
shown), the network architecture may comprise, alone or in any suitable combination, a 
telephone-based network (such as a PBX or POTS), a local area network (LAN), a wide 
area network (WAN), a dedicated intranet, and/or the Internet. Further, it may comprise 
any suitable combination of wired and/or wireless components and systems. By using 
dedicated communication links or shared network architecture, Analysis System 110 may 
be located in the same location or at a geographically distant location from Instramentation 
100. 

I/O interface 3130 of the system environment shown in Figure 31 may be 
implemented with a wide variety of devices to receive and/or provide the data to and from 
Analysis System 110. I/O interface 3130 may include an input device, a storage device, 
and/or a network. The input device may include a keyboard, a mouse, a disk drive, video 
camera, magnetic card reader, or any other suitable input device for providing data to 
Analysis System 110. 

Network interface 3 140 may be connected to a network, such as a Wide Area 
Network, a Local Area Network, or the Internet for providing read/write access to records. 

Memory device 3150 may be implemented with various forms of memory or 
storage devices, such as read-only memory (ROM) devices and random access memory 
(RAM) devices. Memory device 3150 may also include a memory tape or disk drive for 
reading and providing records on a storage tape or disk as input to Analysis System 110. 
Memory device 3150 may comprise computer instructions forming: an operating system 
3152 and one or more modules 3154, 3156, 3158, 3160, and 3162. 

As previously illustrated, patterns and dendrograms may be produced by the 
present invention. To facilitate user interaction with the system, a set of user tools may be 
provided consistent with the present invention. Patterns may be shifted in the X or Y 
directions, or combinations thereof. The patterns may be manually shifted into different 
clusters or resorted. In addition, as previously mentioned, the user may slice the 
Dendrogram in various ways to change the number of forms selected. In addition, a mixture 
tools permits the user to select a series of reference pattems and analyze other pattems to 
determine if it is a mixture of the reference pattems. 
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Also, a user may subtract a first pattern from a second pattern, wherein the 
subtraction of the pattern occurs by the subtraction of Hke peaks, regardless of the peak size. 
For example, subtracting pattern A from pattern B, each of which has characteristic peaks of 
varying amplitudes at 2Theta - i, will result in a complete subtraction of the peak to a zero 
level regardless of the actual amplitude differences. The resulting pattern from the above 
subtraction operation may be utilized as an input pattern in matching or HCA operations. 

Those skilled in the art will appreciate that all or part of systems and 
methods consistent with the present invention may be stored on or read from other machine- 
readable media, such as: secondary storage devices, like hard disks, floppy disks, and CD- 
ROM; a carrier wave received from the Intemet; or other forms of machine-readable 
memory, such as read-only memory (ROM) or random-access memory (RAM). 

Furthermore, one skilled in the art will also realize that the processes 
illustrated in this description may be implemented in a variety of ways and include multiple 
other modules, programs, applications, scripts, processes, threads, or code sections that all 
fimctionally interrelate with each other to accomplish the individual tasks described above 
for each module, script, and daemon. For example, it is contemplated that these programs 
modules may be implemented using commercially available software tools, using custom 
object-oriented code written in the C-f + programming language, using applets written in the 
Java programming language, or may be implemented as with discrete electrical components 
or as one or more hardwired application specific integrated circuits (ASIC) custom designed 
just for this purpose. 

It will be readily apparent to those skilled in this art that various changes and 
modifications of an obvious nature maybe made, and all such changes and modifications 
are considered to fall within the scope of the appended claims. Other embodiments of the 
invention will be apparent to those skilled in the art from consideration of the specification 
and practice of the invention disclosed herein. It is intended that the specification and 
examples be considered as exemplary only, with a tme scope and spirit of the invention 
being indicated by the following claims and their equivalents. 
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What is claimed is: 

1 . A method of analyzing diffraction patterns, comprising: 
receiving a first diffraction pattern; 

receiving a second diffraction pattem; 
receiving a third diffraction pattem; 

determining a first similarity between the first and the second diffraction 

pattems; 

determining a second similarity between the first and the third diffraction 

pattems; 

determining a third similarity between the second and the third diffraction 

pattems; and 

performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattem based on the determined first similarity, the second similarity, and the 
third similarity. 

2. The method of claim 1, further comprising normalizing the intensity 
of the diffraction pattems. 

3. The method of claim 1 further comprising angle tmncation of the 
diffraction pattems. 

4. The method of claim 1 further comprising compensating the baseline 
of the diffraction pattems. 

5. The method of claim 1 further comprising smoothing the diffraction 

pattems. 

6. The method of claim 1 further comprising removing the broad 
features of the diffraction pattems. 

7. The method of claim 1 further comprising computing the variance of 
the diffraction pattems. 
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8. The method of claim 1 further comprising detecting the potential 
presence of preferred orientation and particle statistics of the diffraction patterns. 

9. The method of claim 1, further comprising: 
normalizing the intensity of the diffraction patterns; 

truncating the angle of the diffraction patterns; 

correcting the baseline of the diffraction pattems; 

smoothing the diffraction pattems; 

removing any broad features of the diffraction patterns; 

computing the variance of the diffraction pattems; and 

detecting the potential presence of preferred orientation and particle statistics of the 
diffraction pattems. 

10. The method of claim 1, wherein the similarities are determined based 
on the characteristic peaks of the diffraction pattems. 

11. The method of claim 10, wherein detemiining the similarities based 
on the peaks comprises: 

detecting crystalline peaks in the diffraction pattems; and 

matching the diffraction pattems based on the detected crystalline peaks. 

12. The method of claim 10, wherein detemiining the similarities based 
on the peaks comprises: 

detecting amorphous peaks in the diffraction pattems; and 

matching the diffraction pattems based on the detected amorphous peaks. 

1 3 . The method of claim 1 0, wherein detecting characteristic peaks 
further comprises: 

determining the characteristic peaks of the diffraction pattems; 
assigning probability scores to the determined characteristic peaks of the 
diffraction pattern; and 
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discretely allocating the determined characteristic peaks into one or more 
groups based on the assigned probability scores. 

14. The method of claim 10, wherein matching the diffraction patterns 
further comprises comparing one or more detected characteristic peaks in the first 
diffraction pattern with one or more detected characteristic peaks in the second diffraction 
pattern. 

15. The method of claim 13, wherein discretely allocating the determined 
characteristic peaks comprises discretely allocating the determined characteristic peaks into 
a first, a second, a third, and a fourth group based on the assigned probability scores. 

16. The method of claim 15, wherein matching the diffraction patterns 
based on the detected characteristic peaks comprises comparing one or more detected 
characteristic peaks in the first diffraction pattern with one or more detected characteristic 
peaks in the second diffraction pattem. 

17. The method of claim 16, wherein comparing one or more detected 
characteristic peaks in the first diffraction pattem with one or more detected characteristic 
peaks in the second diffraction pattem further comprises: 

for each characteristic peak in the first group of the first diffraction pattem, 
comparing the characteristic peak in the first group of the first diffraction pattem with the 
characteristic peaks in the first, second, or third group of the second diffraction pattem and 
penalizing a matching score if the characteristic peak in the first group of the first 
diffraction pattem is not found in the first, second, or third group of the second diffraction 
pattem. 

18. The method of claim 17, wherein comparing one or more detected 
characteristic peaks in the first diffraction pattem with one or more detected characteristic 
peaks in the second diffraction pattem further comprises: 

for each characteristic peak in the second group of the first diffraction 
pattem, comparing the characteristic peak in the second group of the first diffraction pattem 
with the characteristic peaks in the first, second, third, or fourth group of the second 
diffraction pattem and penalizing a matching score if the characteristic peak in the first 
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group of the first diflfraction pattern is not found in the first, second, third, or fourth group of 
the second diffraction pattern. 

19. The method of claim 1 6, wherein matching the diffraction patterns 
based on the detected characteristic peaks further comprises comparing one or more 
detected characteristic peaks in the second diffraction pattern with one or more detected 
characteristic peaks in the first diffiraction pattem. 

20. The method of claim 1, wherein pre-processing the diffractions 
pattem further comprises smoothing the diffraction patterns. 

21 . The method of claim 1, wherein pre-processing the diffractions 
pattem further comprises scaling into a common measurement range the diffraction 
patterns. 

22. The method of claim 1, wherein pre-processing the diffractions 
pattem further comprises scaUng into a common step size the diffraction patterns. 

23. The method of claim 1, wherein pre-processing the diffractions 
pattem further comprises normalizing the diffraction patterns. 

24. The method of claim 1 , wherein pre-processing the diffractions 
pattem further comprises: 

smoothing the diffraction patterns; 

scaling into a common measurement range the diffraction patterns; 
scaling into a common step size the diffraction patterns; and 
normalizing the diffraction patterns. 

25. The method of claim 1, wherein determining the similarity between 
the first diffraction pattem and the second diffraction pattem further comprises matching the 
intensity envelopes of the first diffraction pattem with the second diffraction pattem. 

26. The method of claim 24, wherein determining the similarity between 
the first diffraction pattem and the second diffraction pattem further comprises matching the 
intensity envelopes of the first diffraction pattem with the second diffraction pattem. 
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27. The method of claim 25, wherein matching the intensity envelopes 
comprises performing a least squares fitting of the first diffraction pattern and the second 
diSraction pattem. 

28. The method of claim 1, further comprising generating the third 
diffraction pattem instead of receiving the third diffraction pattem. 

29. The method of claim 28, wherein the third diffraction pattem is a 
simulated pattem of a disordered form and fiarther wherein generating the third diffraction 
pattem comprises: 

receiving a peak list; and 

simulating disorder in the received peak list to generate a disordered pattem 
as the third diffraction pattem 

30. The method of claim 29, wherein simulating disorder in the received 
peak list comprises: 

simulating instrument parameters; 

receiving an operator defined microstmcture parameter; 

modeling material disorder based on the received microstmcture parameter; 

and 

optimizing the disordered parameter. 

31. The method of claim I, wherein the hierarchical cluster analysis 
further comprises determining a cut off similarity of a dendrogram. 

32. The method of claim 2 1 , wherein the cut off similarity is based on the 
similarity of the patterns, 

33. The method of claim 1, further comprising X-shifting the first 
diffraction pattem prior to determining the similarity between the first diffraction pattem 
and the second diffiraction pattem and determining the similarity between the first 
diffraction pattem and the third diffraction pattem. 

34. The method of claim 1, further comprising determining preferred 
orientation and particle statistics in the first diffiraction pattem. 
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35. The method of claim 16, wherein comparing the peaks further 
comprises matching a split peak with a peak having a shoulder as an acceptable match. 

36. The method of claim 10, wherein the characteristic peaks are detected 
based upon a threshold value. 

37. The method of claim 36, wherein the threshold value is based on a 
computed variance. 

38. The method of claim 36, wherein the threshold value is based on a 
noise level of the patterns. 

39. The method of claim 1, wherein performing hierarchical cluster 
analysis further comprises using the minimum link methodology, 

40. The method of claim 1, wherein performing hierarchical cluster 
analysis further comprises using the average link methodology. 

4 1 . The method of claim 1 , wherein performing hierarchical cluster 
analysis further comprises using the maximum link methodology. 

42. The method of claim 1, wherein hierarchical cluster analysis 
comprises identifying one or more clusters and fixrther comprising identifying clusters that 
are mixtures of other clusters. 

43. The method of claim 10, further comprising identifying the region of 
a characteristic peak. 

44. The method of claim 25, further comprising determining the 
crystallinity of the patterns. 

45. The method of claim 10, further comprising subtracting the 
characteristic peaks of the first pattern from the characteristic peaks of the second pattern, 
wherein the subtraction completely removes matching peaks regardless of amplitude. 

46. The method of claim 1, further comprising visually constructing a 
fourth pattern based on operator input percentages of the first pattern and the second pattern. 
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47. The method of claim 46, further comprising comparing the fourth 
pattem with a pattem selected from the first pattern, the second pattern, and the third 
pattern. 

48. A system for analyzing patterns, the system comprising: 
a memory; and 

a processor coupled to the memory for: 
receiving a first diffraction pattem; 
receiving a second diffraction pattem; 
receiving a third diffraction pattem; 

determining a first similarity between the first and the second diffraction 



patterns; 
pattems; 
patterns; and 



determining a second similarity between the first and the third diffraction 
determining a third similarity between the second and the third diffraction 



performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattem based on the determined first similarity, the second similarity, and the 
third similarity. 

49. The system of claim 48, further comprising normalizing the intensity 
of the diffraction pattems. 

50. The system of claim 49 further comprising angle tmncation of the 
diffraction pattems. 

5 1 . The system of claim 49 further comprising compensating the baseline 
of the diffraction pattems. 

52. The system of claim 49 further comprising smoothing the diffraction 

pattems. 

53. The system of claim 49 further comprising removing the broad 
features of the diffraction pattems. 
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54. The system of claim 49 flirther comprising computing the variance of 
the diffraction patterns. 

55. The system of claim 49 further comprising detecting the potential 
presence of preferred orientation and particle statistics of the diffraction patterns. 

56. The system of claim 49, further comprising: 
normalizing the intensity of the diffraction patterns; 

truncating the angle of the diffraction pattems; 

correcting the baseline of the diffraction pattems; 

smoothing the diffraction pattems; 

removing any broad features of the diffraction pattems; 

computing the variance of the diffraction pattems; and 

detecting the potential presence of preferred orientation and particle statistics of the 
diffraction pattems. 

57. The system of claim 1 , wherein the similarities are determined based 
on the characteristic peaks of the diffraction pattems. 

58. The system of claim 57, wherein determining the similarities based 
on the peaks comprises: 

detecting crystalline peaks in the diffraction pattems; and 

matching the diffraction pattems based on the detected crystalline peaks. 

59. The system of claim 57, wherein determining the similarities based 
on the peaks comprises: 

detecting broad features in the diffraction pattems; and 

matching the diffraction pattems based on the detected broad features. 

60. The system of claim 57, wherein detecting characteristic peaks 
further comprises: 

determining the characteristic peaks of the diffraction pattems; 
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assigning probability scores to the determined characteristic peaks of the 
diffraction pattern; and 

discretely allocating the determined characteristic peaks into one or more 
groups based on the assigned probability scores. 

61 . The system of claim 57, wherein matching the diffraction pattems 
fiirther comprises comparing one or more detected characteristic peaks in the first 
diffraction pattem with one or more detected characteristic peaks in the second diffraction 
pattern. 

62. The system of claim 60, wherein discretely allocating the determined 
characteristic peaks comprises discretely allocating the determined characteristic peaks into 
a first, a second, a third, and a fourth group based on the assigned probability scores. 

63. The system of claim 62, wherein matching the diffraction pattems 
based on the detected characteristic peaks comprises comparing one or more detected 
characteristic peaks in the first diffiraction pattem with one or more detected characteristic 
peaks in tiie second diffraction pattem. 

64. The system of claim 63, wherein comparing one or more detected 
characteristic peaks in the first diffraction pattem with one or more detected characteristic 
peaks in the second diffraction pattem fiirther comprises: 

for each characteristic peak in the first group of the first diffiraction pattem, 
comparing the characteristic peak in the first group of the first diffraction pattem with the 
characteristic peaks in the first, second, or third group of the second diffraction pattem and 
penalizing a matching score if the characteristic peak in the first group of the first 
diffraction pattem is not found in the first, second, or third group of the second diffraction 
pattem. 

65. The system of claim 64, wherein comparing one or more detected 
characteristic peaks in the first diffraction pattem with one or more detected characteristic 
peaks in the second diffraction pattem fiirther comprises: 

for each characteristic peak in the second group of the first diffraction 
pattem, comparing the characteristic peak in the second group of the first diffraction pattem 
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with the characteristic peaks in the first, second, third, or fourth group of the second 
diffraction pattern and penalizing a matching score if the characteristic peak in the first 
group of the first diffraction pattern is not found in the first, second, third, or fourth group of 
the second diffraction pattern. 

66. The system of claim 63, wherein matching the diffraction patterns 
based on the detected characteristic peaks fiirther comprises comparing one or more 
detected characteristic peaks in the second diffraction pattem with one or more detected 
characteristic peaks in the first diffraction pattem. 

67. The system of claim 48, wherein pre-processing the diffractions 
pattem ftirther comprises smoothing the diffraction patterns. 

68. The system of claim 48, wherein pre-processiag the diffractions 
pattem further comprises scaling into a common measurement range the diffraction 
pattems. 

69. The system of claim 48, wherein pre-processing the diffractions 
pattem ftirther comprises scaling into a common step size the diffraction pattems. 

70. The system of claim 48, wherein pre-processing the diffractions 
pattem fiirther comprises normalizing the diffraction pattems. 

7 1 . The system of claim 48, wherein pre-processing the diffractions 
pattem fiirther comprises: 

smoothing the diffraction pattems; 

scaling into a common measurement range the diffraction pattems; 
scaling into a common step size the diffraction pattems; and 
normalizing the diffraction pattems. 

72. The system of claim 48, wherein determining the similarity between 
the first diffraction pattem and the second diffraction pattem fiirther comprises matching the 
intensity envelopes of the first diffraction pattem with the second diffraction pattem. 
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73 . The system of claim 7 1 , wherein deteraiining the similarity between 
the first diflfraction pattem and the second diffraction pattem further comprises matching the 
intensity envelopes of the first diffraction pattem with the second diffraction pattem. 

74. The system of claim 72, wherein matching the intensity envelopes 
comprises performing a least squares fitting of the first diffraction pattem and the second 
diffraction pattem. 

75. The system of claim 48, further comprising generating the third 
diffraction pattem instead of receiving the third diffraction pattem. 

76. The system of claim 75, wherein the third diffraction pattem is a 
simulated pattem of a disordered form and further wherein generating the third diffraction 
pattem comprises: 

receiving a peak list; and 

simulating disorder in the received peak Kst to generated a disordered pattem 
as the third diffraction pattem 

77. The system of claim 76, wherein simulating disorder in the received 
peak list comprises: 

simulating instrument parameters; 

receiving an operator defined microstmcture parameter; 

modeling material disorder based on the received microstmcture parameter; 

and 

optimizing the disordered parameter. 

78. The system of claim 48, wherein the hierarchical cluster analysis 
further comprises determining a cut off similarity of a dendrogram, 

79. The system of claim 68, wherein the cut off similarity is based on the 
similarity of the patterns. 

80. The system of claim 48, further comprising X-shifting the first 
diffraction pattem prior to determining the similarity between the first diffraction pattem 
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and the second diffraction pattern and determining the similarity between the first 
diffraction pattern and the third diffraction pattern. 

8 1 . The system of claim 48, further comprising determining preferred 
orientation and particle statistics in the first diffraction pattern. 

82. The system of claim 63, wherein comparing the peaks fiirther 
comprises matching a split peak with a peak having a shoulder as an acceptable match. 

83. The system of claim 57, wherein the characteristic peaks are detected 
based upon a threshold value. 

84. The system of claim 83, wherein the threshold value is based on a 
computed variance. 

85. The system of claim 83, wherein the threshold value is based on a 
noise level of the patterns. 

86. The system of claim 48, wherein performing hierarchical cluster 
analysis ftirther comprises using the minimum link methodology. 

87. The system of claim 48, wherein performing hierarchical cluster 
analysis ftirther comprises using the average link methodology. 

88. The system of claim 48, wherein performing hierarchical cluster 
analysis ftirther comprises using the maximum link methodology. 

89. The system of claim 48, wherein hierarchical cluster analysis 
comprises identifying one or more clusters and fiirfher comprising identifying clusters that 
are mixtures of other clusters. 

90. The system of claim 57, fiirther comprising identifying the region of 
a characteristic peak. 

9 1 . The system of claim 72, fiirther comprising determining the 
crystallinity of the patterns. 



39 



wo 2004/013622 



PCT/US2003/024507 



92. The system of claim 57, further comprising subtracting the 
characteristic peaks of the first pattem from the characteristic peaks of the second pattern, 
wherein the subtraction completely removes matching peaks regardless of amplitude. 

93. The system of claim 48, further comprising visually constructing a 
fourth pattem based on operator input percentages of the first pattem and the second pattem. 

94. The system of claim 93, further comprising comparing the fourth 
pattem with a pattem selected from the first pattem, the second pattem, and the third 
pattem. 

95. A machine-readable magnetic medium comprising instmctions stored 
on the medium, the instmction when executed perform the stages of: 

receiving a first diffraction pattem; 
receiving a second diffraction pattem; 
receiving a third diffraction pattem; 

determining a first similarity between the first and the second diffraction 

patterns; 

determining a second similarity between the first and the third diffraction 

patterns; 

determining a third similarity between the second and the third diffraction 

patterns; and 

performing hierarchical cluster anal^^sis on the first, the second, and the third 
diffraction pattem based on the determined first similarity, the second similarity, and the 
third similarity. 

96. The machine-readable magnetic medium of claim 95, fiirther 
comprising instmctions for normalizing the intensity of the diffraction patterns. 

97. The machine-readable magnetic medixim of claim 95 further 
comprising instmctions for angle truncation of the diffraction pattems. 

98. The machine-readable magnetic medium of claim 95 further 
comprising instmctions for compensating the baseline of the diffraction pattems. 
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99. The machine-readable magnetic medium of claim 95 further 
comprising instructions for smoothing the diffraction pattems. 

100. The machine-readable magnetic medium of claim 95 further 
comprising instructions for removing the broad features of the diffraction pattems. 

101. The machine-readable magnetic medium of claim 95 further 
comprising instructions for computing the variance of the diffraction pattems. 

102. The machine-readable magnetic medium of claim 95 further 
comprising instructions for detecting the potential presence of preferred orientation and 
particle statistics of the diffraction pattems. 

103. The machine-readable magnetic medium of claim 95, further 
comprising instructions for: 

normalizing the intensity of the diffraction pattems; 

truncating the angle of the diffraction pattems; 
correcting the baseline of the diffraction pattems; 
smoothing the diffraction pattems; 
removing any broad features of the diffraction pattems; 
computing the variance of the diffraction pattems; and 

detecting the potential presence of preferred orientation and particle statistics of the 
diffraction pattems. 

104. The machine-readable magnetic medium of claim 95, wherein the 
similarities are determined based on the characteristic peaks of the diffraction pattems. 

105. The machine-readable magnetic medium of claim 104, wherein 
determining the similarities based on the peaks comprises: 

detecting crystalline peaks in the diffraction pattems; and 

matching the diffraction pattems based on the detected crystalline peaks. 
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1 06. The machine-readable magnetic medium of claim 1 04, wherein 
determining the similarities based on the peaks comprises: 

detecting broad features in the diffiraction patterns; and 

matching the diffraction patterns based on the detected broad features. 

1 07. The machine-readable magnetic medium of claim 1 04, wherein 
detecting characteristic peaks further comprises: 

determining the characteristic peaks of the diffraction patterns; 

assigning probability scores to the determined characteristic peaks of the 
diffraction pattem; and 

discretely allocating the determined characteristic peaks into one or more 
groups based on the assigned probability scores. 

108. The machine-readable magnetic medium of claim 104, wherein 
matching the diffraction patterns ftirther comprises comparing one or more detected 
characteristic peaks in the first diffraction pattem with one or more detected characteristic 
peaks in the second diffraction pattem. 

1 09. The machine-readable magnetic medium of claim 1 07, wherein 
discretely allocating the determined characteristic peaks comprises discretely allocating the 
determined characteristic peaks into a first, a second, a third, and a fourth group based on 
the assigned probability scores. 

110. The machine-readable magnetic medium of claim 109, wherein 
matching the diffraction patterns based on the detected characteristic peaks comprises 
comparing one or more detected characteristic peaks in the first diffraction pattem with one 
or more detected characteristic peaks in the second diffraction pattem. 

111. The machine-readable magnetic medium of claim 110, wherein 
comparing one or more detected characteristic peaks in the jBrst diffraction pattem with one 
or more detected characteristic peaks in the second diffraction pattem further comprises: 

for each characteristic peak in the first group of the first diffraction pattem, 
comparing the characteristic peak in the first group of the first diffraction pattem with the 
characteristic peaks in the first, second, or third group of the second diffraction pattem and 
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penalizing a matching score if the characteristic peak in the first group of the first 
diffraction pattern is not found in the fn:st, second, or third group of the second diffraction 
pattern. 

1 12. The machine-readable magnetic medium of claim 111, wherein 
comparing one or more detected characteristic peaks in the first diffraction pattem with one 
or more detected characteristic peaks in the second diffraction pattem fiarther comprises: 

for each characteristic peak in the second group of the first diffraction 
pattem, comparing the characteristic peak in the second group of the first diffraction pattem 
with the characteristic peaks in the first, second, third, or fourth group of the second 
diffraction pattem and penalizing a matching score if the characteristic peak in the first 
group of the first diffraction pattem is not found in the first, second, third, or fourth group of 
the second diffraction pattem. 

113. The machine-readable magnetic medium of claim 110, wherein 
matching the diffraction patterns based on the detected characteristic peaks further 
comprises comparing one or more detected characteristic peaks in the second diffraction 
pattem with one or more detected characteristic peaks in the first diffraction pattem. 

1 14. The machine-readable magnetic mediimi of claim 95, wherein pre- 
processing the diffractions pattem fiirther comprises smoothing the diffiraction patterns. 

115. The machine-readable magnetic medium of claim 95, wherein pre- 
processing the diffractions pattem further comprises scaling iato a common measurement 
range the diffraction patterns. 

116. The machine-readable magnetic medium of claim 95, wherein pre- 
processing the diffractions pattem fiirther comprises scaling into a common step size the 
dif&action patterns. 

1 17. The machine-readable magnetic medium of claim 95, wherein pre- 
processing the diffractions pattem finther comprises normalizing the diffraction patterns. 

118. The machine-readable magnetic medium of claim 95, wherein pre- 
processing the diffractions pattem fiirther comprises: 
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smoothing the diffraction patterns; 

scaling into a common measurement range the diffraction patterns; 
scaling into a common step size the diffraction pattems; and 
normalizing the diffraction pattems. 

119. The machine-readable magnetic mediimi of claim 95, wherein 
determining the similarity between the first diffraction pattern and the second diffraction 
pattem further comprises matching the intensity envelopes of the first diffraction pattern 
with the second diffraction pattem. 

120. The machine-readable magnetic medium of claim 118, wherein 
determining the similarity between the first diffraction pattem and the second diffraction 
pattem further comprises matching the intensity envelopes of the first diffraction pattem 
with the second diffraction pattem. 

121 . The machine-readable magnetic medium of claim 119, wherein 
matching the intensity envelopes comprises performing a least squares fitting of the first 
diffraction pattem and the second diffraction pattem. 

122. The machine-readable magnetic medium of claim 95, further 
comprising generating the third diffraction pattem instead of receiving the third diffraction 
pattem. 

123. The machine-readable magnetic medium of claim 122, wherein the 
third diffraction pattem is a simulated pattem of a disordered form and further wherein 
generating the third diffraction pattem comprises: 

receiving a peak list; and 

simulating disorder in liie received peak list to generated a disordered pattem 
as the third diffraction pattem 

124. The machine-readable magnetic medium of claim 123, wherein 
simulating disorder in the received peak list comprises: 

simulating iastrument parameters; 

receiving an operator defined microstmcture parameter; 
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modeling material disorder based on the received microstructure parameter; 

and 

optimizing the disordered parameter. 

125. The machine-readable magnetic medium of claim 95, wherein the 
hierarchical cluster analysis fiirther comprises determining a cut off similarity of a 
dendrogram. 

126. The machine-readable magnetic medium of claim 115, wherein the 
cut off similarity is based on the similarity of the patterns. 

127. The machine-readable magnetic medium of claim 95, further 
comprising X-shifting the first diffraction pattern prior to determining the similarity 
between the first diffraction pattern and the second diffraction pattem and determining the 
similarity between the first diffraction pattem and the third diffraction pattem. 

128. The machine-readable magnetic medium of claim 95, further 
comprising determining preferred orientation and particle statistics in the first diffraction 
pattem, 

129. The machine-readable magnetic medium of claim 1 1 0, wherein 
comparing the peaks further comprises matching a split peak with a peak having a shoulder 
as an acceptable match. 

130. The machine-readable magnetic medium of claim 104, wherein the 
characteristic peaks are detected based upon a threshold value. 

131. The machine-readable magnetic medium of claim 130, wherein the 
threshold value is based on a computed variance. 

132. The machine-readable magnetic medium of claim 130, wherein the 
threshold value is based on a noise level of the patterns. 

133. The machine-readable magnetic medium of claim 95, wherein 
performing hierarchical cluster analysis further comprises using the minimum link 
methodology. 
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1 34. The machine-readable magnetic medium of claim 95, wherein 
performing hierarchical cluster analysis further comprises using the average link 
methodology. 

135. The machine-readable magnetic medium of claim 95, wherein 
performing hierarchical cluster analysis further comprises using the maximum link 
methodology. 

136. The machine-readable magnetic medium of claim 95, wherein 
hierarchical cluster analysis comprises identifying one or more clusters and further 
comprising identifying clusters that are mixtures of other clusters. 

137. The machine-readable magnetic medixmi of claim 104, further 
comprising identifying the region of a characteristic peak. 

138. The machine-readable magnetic medium of claim 119, further 
comprising determining the crystallinity of the patterns. 

139. The machine-readable magnetic medium of claim 104, further 
comprising subtracting the characteristic peaks of the first pattern from the characteristic 
peaks of the second pattern, wherein the subtraction completely removes matching peaks 
regardless of amplitude. 

140. The machine-readable magnetic medium of claim 95, further 
comprising visually constructing a fourth pattern based on operator input percentages of the 
first pattern and the second pattem. 

141 . The machine-readable magnetic medium of claim 140, further 
comprising comparing the fourth pattem with a pattem selected from the first pattem, the 
second pattem, and the third pattem. 

142. A method of analyzing patterns, comprising: 
receiving a first diffraction pattem; 

receiving a second diffraction pattem; 
receiving a third diffraction pattem; 
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determining a first similarity between the first and the second diffraction 
patterns based on the characteristic peaks of the first and the second diffraction patterns; 

determining a second similarity between the first and the third diffraction 
patterns based on the characteristic peaks of the first and the third diffraction patterns; 

determining a third similarity between the second and the third diffiraction 
pattems based on the characteristic peaks of the second and the third diffraction patterns; 
and 

perfomiing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattern based on the determined first, the second, and the third similarity. 

143. A system for analyzing pattems, the system comprising: 
a memory; and 

a processor coupled to the memory for: 
receiving a first diffraction pattem; 
receiving a second diffraction pattem; 
receiving a third diffraction pattem; 

determining a first similarity between the first and the second diffraction 
pattems based on the characteristic peaks of the first and the second diffraction pattems; 

determining a second similarity between the first and the third diffraction 
pattems based on the characteristic peaks of the first and the third diffraction pattems; 

determining a third similarity between the second and the third diffraction 
pattems based on the characteristic peaks of the second and the third diffiraction pattems; 
and 

performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattem based on the determined first, the second, and the third similarity. 

144. A machine-readable magnetic medium comprising instructions stored 
on the medium, the instmction when executed perfoi;m the stages of: 

receiving a first diffraction pattem; 
receiving a second diffraction pattem; 
receiving a third diffraction pattem; 

determining a first similarity between the first and the second diffraction 
pattems based on the characteristic peaks of the first and the second diffraction pattems; 
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determining a second similarity between the first and the third diffraction 
patterns based on the characteristic peaks of the first and the third diffraction patterns; 

determining a third similarity between the second and the third diffraction 
patterns based on the characteristic peaks of the second and the third diffi-action patterns; 
and 

performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattern based on the determined first, the second, and the third similarity. 

145. A method of analyzing patterns, comprising: 
receiving a first diffraction pattern; 

receiving a second diffraction pattern; 
receiving a third diffraction pattern; 

detennining a first similarity between the first and the second diffraction 
patterns based on the intensity envelopes of the first and the second diffraction patterns; 

determining a second similarity between the first and the third diffraction 
pattems based on the intensity envelopes of the first and the third diffraction patterns; 

determining a third similarity between the second and the third diffraction 
pattems based on the intensity envelopes of the second and the third diffraction pattems; 
and 

performing hierarchical cluster analysis on the first, the second, and the third 
diffiraction pattern based on the determined first, the second, and the third similarity. 

146. A system for analyzing pattems, the system comprising: 
a memory; and 

a processor coupled to the memory for: 
receiving a first diffraction pattern; 
receiving a second diffraction pattern; 
receiving a third diffraction pattern; 

determining a first similarity between the first and the second diffraction 
pattems based on the intensity envelopes of the first and the second diffraction pattems; 

determining a second similarity between the first and the third diffraction 
pattems based on the intensity envelopes of the first and the third diffraction pattems; 
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determining a third similarity between the second and the third diffraction 
patterns based on the intensity envelopes of the second and the third diffraction patterns; 
and 

performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattern based on the determined first, the second, and the third similarity. 

147. A machine-readable magnetic medium comprising instructions stored 
on the medium, the instmction when executed perform the stages of: 

receiving a first diffraction pattern; 
receiving a second diffraction pattern; 
receiving a third diffraction pattern; 

determining a first similarity between the first and the second diffraction 
patterns based on the intensity envelopes of the first and the second diffraction patterns; 

determining a second similarity between the first and the third diffraction 
pattems based on the intensity envelopes of the first and the third diffraction patterns; 

determining a third similarity between the second and the third diffraction 
pattems based on the intensity envelopes of the second and the third diffraction pattems; 
and 

performing hierarchical cluster analysis on the first, the second, and the third 
diffraction pattern based on the determined first, the second, and the third similarity, 

148. A method of analyzing a pattern of a disordered form, comprising: 
receiving a diffraction pattem of the disordered form; 

simulating a simulated disordered form based on the peak list of the ordered 

form; and 

matching the simulated disordered form to the diffraction pattem of the 
disordered form. 

149. A system for analyzing a pattem of a disordered form, the system 

comprising: 

a memory; and 

a processor coupled to the memory for: 

receiving a diffraction pattem of the disordered form; 
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simulating a simulated disordered form based on the peak list of the ordered 

form; and 

matching the simulated disordered form to the diffraction pattern of the 
disordered form. 

150. A machine-readable magnetic medium comprising instructions stored 
on the medium, the instruction when executed perform the stages of: 

receiving a diffraction pattern of a disordered form; 

simulating a simulated disordered form based on the peak list of the ordered 

form; and 

matching the simulated disordered form to the diffraction pattern of the 
disordered form. 

151. A method of matching patterns, comprising: 
performiag pattern matching on three or more pattern to determine 

similarities between the patterns; and 

performing hierarchical cluster analysis on the three or more patterns based 
on the determined similarities. 

1 52. A method of solid fonn screening, comprising: 

solidifying a material under a first condition to generate a first resulting 

solid; 

solidifying a material under a second condition to generate a second resulting 

solid; 

analyzing the first resulting solid and the second resulting solid by 
diffraction analysis to generate a respective first diffraction pattern and a second diffraction 
pattern; 

determining a similarity between the first diffraction pattern and the second 
diffraction pattem; and 

performing hierarchical cluster analysis using the similarity. 

153. A system for performing solid form screening, the system 

comprising: 

a memory; and 
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a processor coupled to the memory for: 

solidifying a material imder a first condition to generate a first resulting 

solid; 

solidifying a material imder a second condition to generate a second resulting 

solid; 

analyzing the first resulting solid and the second resulting solid by 
diffraction analysis to generate a respective first diffraction pattern and a second diffraction 
pattem; 

determining a similarity between the first diffraction pattem and the second 
diffraction pattem; and 

performing hierarchical cluster analysis using the similarity. 

154. A machine-readable magnetic medium comprising instructions stored 
on the medium, the instmction when executed perform the stages of: 

solidifying a material under a first condition to generate a first resulting 

solid; 

solidifying a material under a second condition to generate a second resulting 

solid; 

analyzing the first resulting solid and the second resulting solid by 
diffraction analysis to generate a respective first diffraction pattem and a second diffraction 
pattem; 

determining a similarity between the first diffraction pattem and the second 
diffraction pattem; and 

performing hierarchical cluster analysis using the similarity. 
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